115 research outputs found

    Increasing the Efficiency of 6-DoF Visual Localization Using Multi-Modal Sensory Data

    Full text link
    Localization is a key requirement for mobile robot autonomy and human-robot interaction. Vision-based localization is accurate and flexible, however, it incurs a high computational burden which limits its application on many resource-constrained platforms. In this paper, we address the problem of performing real-time localization in large-scale 3D point cloud maps of ever-growing size. While most systems using multi-modal information reduce localization time by employing side-channel information in a coarse manner (eg. WiFi for a rough prior position estimate), we propose to inter-weave the map with rich sensory data. This multi-modal approach achieves two key goals simultaneously. First, it enables us to harness additional sensory data to localise against a map covering a vast area in real-time; and secondly, it also allows us to roughly localise devices which are not equipped with a camera. The key to our approach is a localization policy based on a sequential Monte Carlo estimator. The localiser uses this policy to attempt point-matching only in nodes where it is likely to succeed, significantly increasing the efficiency of the localization process. The proposed multi-modal localization system is evaluated extensively in a large museum building. The results show that our multi-modal approach not only increases the localization accuracy but significantly reduces computational time.Comment: Presented at IEEE-RAS International Conference on Humanoid Robots (Humanoids) 201

    Dense 3D Object Reconstruction from a Single Depth View

    Get PDF
    In this paper, we propose a novel approach, 3D-RecGAN++, which reconstructs the complete 3D structure of a given object from a single arbitrary depth view using generative adversarial networks. Unlike existing work which typically requires multiple views of the same object or class labels to recover the full 3D geometry, the proposed 3D-RecGAN++ only takes the voxel grid representation of a depth view of the object as input, and is able to generate the complete 3D occupancy grid with a high resolution of 256^3 by recovering the occluded/missing regions. The key idea is to combine the generative capabilities of autoencoders and the conditional Generative Adversarial Networks (GAN) framework, to infer accurate and fine-grained 3D structures of objects in high-dimensional voxel space. Extensive experiments on large synthetic datasets and real-world Kinect datasets show that the proposed 3D-RecGAN++ significantly outperforms the state of the art in single view 3D object reconstruction, and is able to reconstruct unseen types of objects.Comment: TPAMI 2018. Code and data are available at: https://github.com/Yang7879/3D-RecGAN-extended. This article extends from arXiv:1708.0796

    D3P : Data-driven demand prediction for fast expanding electric vehicle sharing systems

    Get PDF
    The future of urban mobility is expected to be shared and electric. It is not only a more sustainable paradigm that can reduce emissions, but can also bring societal benefits by offering a more affordable on-demand mobility option to the general public. Many car sharing service providers as well as automobile manufacturers are entering the competition by expanding both their EV fleets and renting/returning station networks, aiming to seize a share of the market and to bring car sharing to the zero emissions level. During their fast expansion, one determinant for success is the ability of predicting the demand of stations as the entire system is growing continuously. There are several challenges in this demand prediction problem: First, unlike most of the existing work which predicts demand only for static systems or at few stages of expansion, in the real world we often need to predict the demand as or even before stations are being deployed or closed, to provide information and decision support. Second, for the new stations to be deployed, there is no historical data available to help the prediction of their demand. Finally, the impact of deploying/closing stations on the other stations in the system can be complex. To address these challenges, we formulate the demand prediction problem in the context of fast expanding electric vehicle sharing systems, and propose a data-driven demand prediction approach which aims to model the expansion dynamics directly from the data. We use a local temporal encoding process to handle the historical data for each existing station, and a dynamic spatial encoding process to take correlations between stations into account with Graph Convolutional Neural Networks (GCN). The encoded features are fed to a multi-scale predictor, which forecasts both the long-term expected demand of the stations and their instant demand in the near future. We evaluate the proposed approach with real-world data collected from a major EV sharing platform for one year. Experimental results demonstrate that our approach significantly outperforms the state of the art, showing up to three-fold performance gain in predicting demand for the expanding EV sharing systems

    Securing Cyber-Physical Social Interactions on Wrist-worn Devices

    Get PDF
    Since ancient Greece, handshaking has been commonly practiced between two people as a friendly gesture to express trust and respect, or form a mutual agreement. In this article, we show that such physical contact can be used to bootstrap secure cyber contact between the smart devices worn by users. The key observation is that during handshaking, although belonged to two different users, the two hands involved in the shaking events are often rigidly connected, and therefore exhibit very similar motion patterns. We propose a novel key generation system, which harvests motion data during user handshaking from the wrist-worn smart devices such as smartwatches or fitness bands, and exploits the matching motion patterns to generate symmetric keys on both parties. The generated keys can be then used to establish a secure communication channel for exchanging data between devices. This provides a much more natural and user-friendly alternative for many applications, e.g., exchanging/sharing contact details, friending on social networks, or even making payments, since it doesn’t involve extra bespoke hardware, nor require the users to perform pre-defined gestures. We implement the proposed key generation system on off-the-shelf smartwatches, and extensive evaluation shows that it can reliably generate 128-bit symmetric keys just after around 1s of handshaking (with success rate >99%), and is resilient to different types of attacks including impersonate mimicking attacks, impersonate passive attacks, or eavesdropping attacks. Specifically, for real-time impersonate mimicking attacks, in our experiments, the Equal Error Rate (EER) is only 1.6% on average. We also show that the proposed key generation system can be extremely lightweight and is able to run in-situ on the resource-constrained smartwatches without incurring excessive resource consumption

    3DCFS : Fast and robust joint 3D semantic-instance segmentation via coupled feature selection

    Get PDF
    We propose a novel fast and robust 3D point clouds segmentation framework via coupled feature selection, named 3DCFS, that jointly performs semantic and instance segmentation. Inspired by the human scene perception process, we design a novel coupled feature selection module, named CFSM, that adaptively selects and fuses the reciprocal semantic and instance features from two tasks in a coupled manner. To further boost the performance of the instance segmentation task in our 3DCFS, we investigate a loss function that helps the model learn to balance the magnitudes of the output embedding dimensions during training, which makes calculating the Euclidean distance more reliable and enhances the generalizability of the model. Extensive experiments demonstrate that our 3DCFS outperforms state-of-the-art methods on benchmark datasets in terms of accuracy, speed and computational cost

    SCAN : learning speaker identity from noisy sensor data

    Get PDF
    Sensor data acquired from multiple sensors simultaneously is featuring increasingly in our evermore pervasive world. Buildings can be made smarter and more efficient, spaces more responsive to users. A fundamental building block towards smart spaces is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit the unique vocal features as people interact with one another. As an example, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation (e.g. through a calendar or MAC address), can we learn to associate a specific identity with a particular voiceprint? Obviously enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. To address this problem, the standard approach is to perform a clustering step (e.g. of audio data) followed by a data association step, when identity-rich sensor data is available. In this paper we show that this approach is not robust to noise in either type of sensor stream; to tackle this issue we propose a novel algorithm that jointly optimises the clustering and association process yielding up to three times higher identification precision than approaches that execute these steps sequentially. We demonstrate the performance benefits of our approach in two case studies, one with acoustic and MAC datasets that we collected from meetings in a non-residential building, and another from an online dataset from recorded radio interviews

    Autonomous learning for face recognition in the wild via ambient wireless cues

    Get PDF
    Facial recognition is a key enabling component for emerging Internet of Things (IoT) services such as smart homes or responsive offices. Through the use of deep neural networks, facial recognition has achieved excellent performance. However, this is only possibly when trained with hundreds of images of each user in different viewing and lighting conditions. Clearly, this level of effort in enrolment and labelling is impossible for wide-spread deployment and adoption. Inspired by the fact that most people carry smart wireless devices with them, e.g. smartphones, we propose to use this wireless identifier as a supervisory label. This allows us to curate a dataset of facial images that are unique to a certain domain e.g. a set of people in a particular office. This custom corpus can then be used to finetune existing pre-trained models e.g. FaceNet. However, due to the vagaries of wireless propagation in buildings, the supervisory labels are noisy and weak. We propose a novel technique, AutoTune, which learns and refines the association between a face and wireless identifier over time, by increasing the inter-cluster separation and minimizing the intra-cluster distance. Through extensive experiments with multiple users on two sites, we demonstrate the ability of AutoTune to design an environment-specific, continually evolving facial recognition system with entirely no user effort

    Shake-n-shack : enabling secure data exchange between Smart Wearables via handshakes

    Get PDF
    Since ancient Greece, handshaking has been commonly practiced between two people as a friendly gesture to express trust and respect, or form a mutual agreement. In this paper, we show that such physical contact can be used to bootstrap secure cyber contact between the smart devices worn by users. The key observation is that during handshaking, although belonged to two different users, the two hands involved in the shaking events are often rigidly connected, and therefore exhibit very similar motion patterns. We propose a novel Shake-n-Shack system, which harvests motion data during user handshaking from the wrist worn smart devices such as smartwatches or fitness bands, and exploits the matching motion patterns to generate symmetric keys on both parties. The generated keys can be then used to establish a secure communication channel for exchanging data between devices. This provides a much more natural and user-friendly alternative for many applications, e.g., exchanging/sharing contact details, friending on social networks, or even making payments, since it doesn't involve extra bespoke hardware, nor require the users to perform pre-defined gestures. We implement the proposed Shake-n-Shack 1 system on off-the-shelf smartwatches, and extensive evaluation shows that it can reliably generate 128-bit symmetric keys just after around 1s of handshaking (with success rate >99%), and is resilient to real-time mimicking attacks: in our experiments the Equal Error Rate (EER) is only 1.6% on average. We also show that the proposed Shake-n-Shack system can be extremely lightweight, and is able to run in-situ on the resource-constrained smartwatches without incurring excessive resource consumption
    • …
    corecore